This script analyses gene set enrichment results from running Gowinda. I highlight cases of nominal (FDR<=0.1) and significant (FDR<=0.5) enrichment.
Results from each gene set database tested are given under subtitles. I also write a little about interesting patterns I identify under the graphs with notes on the functions of gene sets and the signal detected.
The bar charts show the FDR for each gene set, the colour of the bars also corresponds to the FDR so shorter bars (i.e. smaller FDRs) are more red just to aid identification of significant results when looking at lots of graphs together. On the right of the graph I also give the expected and observed number of hits to assess the effect size.
Each row corresponds to a different subspecies dataset (all=‘all samples’, ce=‘central and eastern’, n=‘Nigeria-Cameroon’, w=‘western’) and each column corresponds to a different empirical p value threshold (0.5%, 0.1%, 0.05%).
I also plot Venn diagrams to get an understanding of whether enrichment of multiple gene sets are independent signals or not.
rm(list=ls())
knitr::opts_knit$set(root.dir = '~/OneDrive - University College London/Projects/Ostridge_PanAf/gowinda/baypass_core')
library(data.table)
options(datatable.fread.datatable=FALSE)
library(dplyr)
library(ggplot2)
library(wesanderson)
library(VennDiagram)
library(UpSetR)
library(tidyverse)
library(ggh4x)
source("scripts/plot_gowinda.R")
gowinda_out_dir="output/gowinda_output/"
stat='fpr'
subsps=c('all', 'ce', 'n', 'w')
if(stat=='p'){
tails=c('0.5pct-cov_cor', '0.1pct-cov_cor', '0.05pct-cov_cor')
stat_name="Empirical tails"
}else if(stat=='fpr'){
tails=c('fpr0.5pct.non-genic_1000bp.flanks', 'fpr0.1pct.non-genic_1000bp.flanks', 'fpr0.05pct.non-genic_1000bp.flanks')
stat_name="Candidate FPR tails"
}else{
stop("Define stat as either 'p' (empircical p value) or fpr (FPR)")
}
plot_gowinda_all=function(subsps, tails, gene.set){
for(subsp in subsps){
cat("-", subsp, "\n")
for(tail in tails){
gowinda_output_file=paste0(gowinda_out_dir, "/",subsp,"/",subsp,".",tail,".gowinda_out.", gene.set)
if(file.exists(gowinda_output_file) & file.size(gowinda_output_file) > 0){
df=fread(gowinda_output_file)
if(nrow(df>0)){plot_gowinda(df, title=paste0(subsp, ": ", "\n", tail, " candidates"))}
}
}
}
}
## - all
## - ce
## - n
## - w
Signal
## ce go fpr0.1pct.non-genic_1000bp.flanks
## - all
## - ce
## - n
## - w
Signal
## - all
## - ce
## - n
## - w
## - all
## - ce
## - n
## - w
## - all
## - ce
## - n
## - w
Signal
## w expr fpr0.1pct.non-genic_1000bp.flanks
## - all
## - ce
## - n
## - w
## - all
## - ce
## - n
## - w
We do not really see any strong signals when running hypothesis free tests using large gene function databases. Our previous work has also pathogens to be important section pressures in chimps at different time scales and so we decided to run some more hypothesis driven tests, investigating enrichment of genes we know are important in pathogen infection.
## - all
## - ce
## - n
## - w
Function
Signal
## - all
## - ce
## - n
## - w
## w pathogen fpr0.05pct.non-genic_1000bp.flanks
## - all
## - ce
## - n
## - w
## - all
## - ce
## - n
## - w
datasets=c("go", "kegg", "reac", "gwas", "expr", "phen", "vips", "pathogen_ebel2017", "imm", "dehy")
#```{r echo=FALSE, out.width=“100%”, fig.height=3.5, fig.width=5.5}
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Min genes per threshold: 0
## Min genes per threshold: 0